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1. INTRODUCTION 

ata packet analysis is one of the most mentioned techniques in the network forensic branch of 
digital forensics [1] In practice, there are lots of data packet analysis cases that target the instant messaging 
(IM) application sofware, For instance, a network forensic case that focused on discovering the IP address of 
the target by analyse the data packets of WhatsApp. which is a popular IM platform [2]. Throughout these 
cases, lots of researchers and forensic engineers has gradually built up a clear understanding of the 
importance of the forensics targeting IM software, especially when mentioning that, in modem society 
people who own smart devices may prefer using one or more thin one IM software to communicate with 
cach other, and this may also apply to criminals or commercial espionage who choose to use IM software to 
transfer eritical information 

‘Tsai eral. [3] suggests that the messaging content and the meta data contained by the data packets 
generated by the IM software have become one of the most eitical types of evidence for law enforcement 0 
‘blain citical evidence in investizations of certain variations of cybercrime. However, we gradually found 
fut that, in this industry, there are rarely similar forensic cases that target Chinese IM soltware or social 
‘media network platforms, Especially for one platform thats even frequently used: QO. 

Huang et al. [4] that focuses on the socal impact of QQ in 2013, QQ has hecome one of the most 
popular social networks and instant messaging platforms in China, with millions of active users in 2013. In 
‘Addition to this, quote from the official weh page of QO international [5]: (QQ has become) the most popular 
personal communications app in history: over 1,000,000,000 registered users across 80+ countries 
Considering that there are existing study cases that target the other IME software, some key points of the 
‘methodology of research and experiment may he able to be referenced from these cases, Based om some 
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related works [2], [3] that have used WhalsApp as the main target of the digital forensics, in addition t afew 
other research and online articles (6), [7], a basic research framework that may fit this project has been 
proposed, 

FFor the majority of the IM applications, including WhatsApp and QQ. when users are using the! 
preferred smart devices to send messages to other users or other accounts, the message will be encoded as a 
‘data packet by the software, These data packets then will be sent by the devices by going through a series of 
network stacks and transporting to the main server of the IM platform, where these data packets will then be 
transported to the target user. In this whole information communication process, the mechanism that the 
software is used to process these data packets is the key differences between these IM applications. This 
determines how the information will be stored and distributed, including the activity pattems of the data 
flows and the structure of some critical information that the majority of IM application is required to 
establish the connection with other users. for example, the intemet protocol (IP) address or the account name 
of the sender and the receiver. By capturing the data packets generated and transferred by this IM software 
‘with proper tools, meanwhile analyse how the information is being handled, the digital forensics engineers 
can then use this Knowledge as a formatter to extract useful information from the captured data packets, 
expecially considering that these captured data packets are just a bunch of bit streams which are not in & 
tbuman readable form, 


2, IMPLEMENTATION TOOLS 

‘Wireshark is a popular open-source tol that i often used by network engineers and digital forensics 
engineers. It is packed with a set of powerful tools that, with the proper configurations, can eapture and 
analyse the data packets of any device [8]. Considering that it has provided an intuitive graphic user interlace 
(GUD, when tying to analyse the captured packets, the information can be directly displayed in an 
insiuctional window, and by using the buil-in filer functionality. twill allow the user to filler out useless 
packets and only display the useful ones, which greatly reduces the complexity of the result analysis phase of 
this project. 

In order to provide a relatively maintainable experiment environment, Kali Linux will be considered 
as the main experiment platform for this project. Kali Linux is a Debian Linux based operating system 
packed with a set of professional tools that is orienting in different branches of the digital forensics. 
‘According to the online documentation provided by Kali officials [9], Wireshark is pre-installed and well- 
Configured so that the tool will work out-of-the-box. Furthermore the lightweight design of the operating 
system itself allows i€ to install asthe virtual machine or to-go system, which is designed to be installed on 
the flash drive and is more user-friendly to the digital forensics engineers considering that it will not produce 
extra problems that may affect the result of the forensics operations. The special quality of the Kali Linux 
‘operating system is especially helpful to the experiment. 

Tn one of the eases that is targeting WhatsApp (21, the researchers have suggested using a "middle 
‘mun’ on the local area network (LAN) to sniff the data packets that are transferred by the IM software 
Inspired by this research, inthis project, a classic technigue that is often used by computer security experts 
called Man-in-the-Middle Attack based on address resolution protocol (ARP) poisoning. Singh eral. [10] 
will be used to let the Kali Linux machine eapture the data packets that will be sent from and tothe targets. 
Alter initialsing the ARP poisoning to the target machine, it will recognise the attacker machine (inthis 
project, that is the Kali Linux machine) asthe gateway of the LAN and send all the network traffic to the 
Attacker machine, and all he data packets that originated from the target machine will go through the attacker 
‘machine 

‘At this point all the attacker machines will need todo isto jus listen and capture all the local data 
packets, To implement this technique, two special tools which are also provided by Kali Linux can be used: 
‘arpspoof which only provides a command line interface, and ettercap which provides an intuitive GUL 
interface. After bret research ofthese two tools [11]. [12] and considering thatthe arpspoof can be initiated 
by a simple terminal command, which can streamline the experiment process, this project will select 
‘arpspoof asthe tool to "redirect" the packets from the target. The basic usage ofthis tool ean be concluded as 
follow command: 


In addition to ths, the Kali Linux machine will need to enable the IP forwarding capabilities 
before initialising the ARP poisoning so thatthe target machine can sill connect to the Internet. In Kali 
Linux, this functionality can be enabled by modifying a configuration file inthe system using the following 
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3. METHOD AND RESULTS 
3.1. Experiment environment 

"To analyse the data packet activities of the QQ software, the basic methodology is to study the 
common user behaviors ofthe software, and mimic these user behaviors during the experiment, while using 
the mentioned tools to record the data packets sent by QQ. After observing the normal QQ ser activites, in 
addition to the experience of the authors, they are the QQ users in real lf, the Following commonly seen use 
ceases of QQ software can be concluded: 
= User log inthe user's QQ account 
= User checking the friend's lis, 
~ User sending and receiving text to another friend (other user account), 
~ User log out the account, 

‘Although the QQ deskiop application software itself also provides other functionalities, such as 
‘Quone, which is similar to the wser profile and user space functionality of Facebook, but this projet will only 
focus on the instant messaging capability of QO. Initially the experiment of this project was intended to use 
(QQ for Linux version software as the target of packet analysis, considering that this experiment will mainly 
bbe conducted on the Kali Linux operating system. Unfortunately. according to multiple articles, forum 
discussions (13]-{15] and limited support from the official QQ for Linux web page [6]. there are 
inconsistent user experiences and stability issues reported related to theofficial QQ or Linux version, which 
itself is also lacks support from the Tencent official, and according to the user feedback and discussions on 
various Chinese social platforms and the Linux community [17], [18], the user experience of QQ for Linux 
are " not very pleasant”, There is also a Windows-immigration version of QQ developed and maintained by 
the deepin Linux community [19], but due to the incompatibility issue related to certain software 
dependencies of Kali Linux. the installation process of the deepen version of QQ caused system 
dependencies failure (Figure 1), Furthermore, considering the marketshare ofthe desktop operating systems 
available inthe markst [20] and the download count data of the QQ Windows version software in China 

(9 [21], majority of users may consider installing the Windows version of QQ ot the more widely 

‘used Windows operating system. 


Figure 1. Bad user interface of QQ for Linux 


‘To get more appropriate results that can cover the majority of QQ software use cases, our 
experiment will use two virtual machines as the experiment environment (Figure 2). Detailed configurations 
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of the two virtual machines (VM) are shown in the table below (Table 1). One VM will have the Kali Linux 
‘operating system with required software installed, the other VM will hve the Windows 10 operating system 
‘with QQ for Windows software installed. 

‘The host with Kali Linux installed will intercept capture, and forward all the data packets send from 
and to the Windows machine with proper viral michine network configurations and the ARP spoof 
technique using the arpspoof tool, Wireshark will dien be used in the Kali Linx host to analyse the activities 
and content of these captured data packets. Before the experiment starts, the pre-installed ping tool will be 
‘wed to check the Internet and LAN connection capabilites of both machines, 
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Figure 2. Nework topology 
‘Table 1. Machines configuration 
Tntiledsotware Pall Dskiop [Sas thehowing Wistar verion32Saypspey, QQ semen 959 2SKS0) and 
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32. Results 


432.1. TCP and UDP connection in the link layer 

‘With the proper configuration of virtual machines und the arpspoof tool, Kali Linux has successfully 
retrieve all the dita packets sent fom the Windows machine. After lysing the content ofthe data packets 
by using the build-in filter of Wireshark to get the target data packets, we conclude that QQ uses transmission 
control protocol (TCP) and user datagram protocol (UDP) connection protocols inthe link layer of the open 
systems interconnection (OSI) network mode! to establish the connection with the main server and exchange 
data. One ofthe main network ports that is being used is the port 8080 (Figure 3). One of the most obvious 
instances is that QQ software has provided an advanced settings page in the aceount log in window to allow 
tusers to choose whether to use TCP or UDP to connect to the log in server, whereas providing a list of 
selectable servers to choose from, 

Inthe "User account log in’ experiment, alter trying to use different connection methods (TCP or 
UDP) or different log in servers to log in the QQ account, the destination IP address of a few certain data 
packets which are eaptured by Wireshark will be the same as the one that is selected in the og i window 
‘The value of the “protocol” attribute of these packets are also correspond to the one that is selected 
Figure 4, 
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Figure 3. TCP connection captured by Wireshark 


Figure 4 Log in page 


32.2, HTTP and HTTPS connection inthe application layer 

After logging in, the QQ will start to fetch and load assets elated to the account, including account 
avatar images and the friends list. Data packets that are responsible for carrying the uniform resource 
identifier (URI) ofthese assets use hypertext transfer protocol (HTTP) or hypertext transfer protocol secure 
(HTTPS) protocols. By analysing the HTTP response message of these data packets in the Wireshark 
‘window (Figure 5), these assets can be extracted outside the QQ software, bu additional techniques related to 
‘cookies may require because it seems thal, 1 fetch the assets from these URIS, a specific cookies, which is 
generated by the QQ software with the current user account log i. is required, 


Figure 5. HTTP request captured by Wireshark 
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323, O1CQ protocol 

Another application layer protocol that QQ uses is the OCQ protocol (Peal extension for QQ instant 
‘messaging protocol), which was design by Tencent. The OICQ protocol is also readable and being formatted 
‘outputted in the Wireshark window. Meanwhile in the official documentation of Wireshark, a list of 
suggested filters is being provided [22], As shown in the below screenshot images, by implementing a filler 
in Wireshark, all the packets that use the OICQ protocol are shown inthe inspection window (Figure 6). 


cr fe bazs2~ se (200 Seu 


Figure 6, Wireshark inspection window 


By reading the given information, the overall structure of the OICQ protocol can he summarised ws: 

Packet flags indicates that die isan OICQ packet. 

bb. Version number (én hexadecimal) ofthe software thatthe current user i using. 

€ Command number that indicates what operation this data packet is holding. Aecording to the above 
screens, it shows that this packet is responsible of sending "Get friend online” request to the server. 
In this experiment, with a series of user behaviors being mimicked, by analysing the content ofall the 
O1CQ packets, following O1CQ command numbers are being collected 
= Request KEY (29), 
~ Heart Message (2): keep the login status alive 
= Get status (13) 
~ Ger status of friend (129), 
= Get friend’ status of group (181), 
~ Request extra information (101). 
= Update User information (4) 
= Get level (92) get the account level 
~ Receive message (23), 
= MEMO Operation (62), 
~ Signature operation (103), 
= Group name operation (60) 
= Log out (1). 

4. Duta sequence number. For some commands sent from QQ, ifthe received data is to big to handle by 
fone single packet, the data will then be split into a Few different pars, For each part ofthe data, it wall 
he carried by diferent packets. By using this sequence number, dhe software can rearrange the data and 
then display to the users, 

© O1CQ number, if sender is client (software user). This will be the QQ account numberof this user. I is 
string of decimal number. Considering that this experiment is using the personal QQ account ofthe 
author himself, the account number that i covered by a red block in the above image will not be given, 

Other data 

Unlike other assets that use HTTP and HTTPS protocol t carry data, the O1CQ protocol is mainly 
responsible for carrying assets thal contain more sensitive information, including but not limited to chat data 
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(ext or images transferred in the chat window) and friend list, Considering the importance of these data, 
according to [23], QQ seems to be using a multiple-round encryption algoritum called the tiny encryption 

Igorthm (TEA encryption algorithm) to encrypt this part ofthe data. This can also be proved by the result 
foblained from the experiment (Figure 7) 


Figure 7. Undecoded information of O1CQ packets 


When checking the “other data part in the inspection window of Wireshark, all the data, no matter 
‘what “command number" itis being responsible for, is undecodeable by Wireshark, In order to decrypt these 
data, by referencing a research project [24] that has analysed the encryption mechanism used by few other IM 
applications, including Facebook IM and Yahoo IM, by using other digital forensics tools to acquire the 
‘messenger database files that are stored in the device itself, the messages may be able to be decrypted, but 
considering that these operations have go beyond the purpose of our research, futher altempis 10 decrypt 
those QQ messages are not being conducted. 


4. FUTURE WORK 

Duct the rapid development and evolution of the IM software onthe market, not only does more 
IM software or platorms that have differen characteristics Keep being introduced, but some existing 
products have also developed more variations of the softwar, Including QQ and the developers from 
TTencent, who have also produced other variations of IM soltware which are focused om different 
functionalities, Far example, TIM [25] which uses te sme account sytem as QQ, is more focused tam 
discussions for the company and enterprise users 

Tn addition to this, QQ also provided software versions for Linux, macOS, Android and iOS 
platfonss. Experiment resulsYrom this project may he able to provide eonsuctve suggestions wo other 
Projects that are focused on the digital Forensics related to these variations of QQ. but different techniques 
‘nay be required. Use mobile Android version of QQ as an example, an Android virwal device and other 
related SDK muy be required whea trying to capture and analyse the data packets via the Android operating 
system [26]. Meanwhile. thee is also a related study by Hao's team [27] tht has provided a relatively 
diferent research and experiment frameworks about the digital forensics of the IM applications on Android 
devices, Considering the noticeable difference beeen the operating systems structures ofthe mobile Android 
and Windows 10, the encryption method oF the structure ofthe data packets generated by QQ software for 
{these two platforms may be diferent 

"Another aplication tht is ao worth mentioning is WeChat [28] which is also developed by 
“eaceat nds more well known to overseas users. Especially since, unlike QQ. WeChat continues o support 
the progressive web aplication (PWA) version ofthe software [29], which allows users who donot have the 
WeChat software pre-installed on theie devices to easily lng into the acount. Considering the difference 
betwcen the nlive software application and the PW [0]. protocols that the PWA version of WeChat may 
implement diferent data packet exchange protocols. While Kali Linux has also a set of tools tbl provide 
packet analysis testing targeting the PWA [SI], these tools may be able to help conduct digital forensics 
targeting the PWA version of WeChat 

‘Similarly, in one of the stuies that is referenced by this research [2] have analysed the ValP 
protocol that is being used by the voice call Linctonalty of WhatsApp. The research tramework and 
Experiment design from this study may provide useful suggestions in further studies that are focused on 
‘nalysing other functionalities ofthe QQ software 

‘As previously mentioned, QQ uses HTTP and HTTPS to fetch certain assets related to the user 
account, and these assets require exta cookies handlers to etch, Considering that Wireshark ca al cape 
the cookies inermation used by these data packets, by using this cookies information, in cooperation wih the 
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crawler techniques [32] forensic engincers may be able to use the fetched asset URIs and the corresponding 

‘cookies information to fetch more useful information about the target QQ accounts 
‘A bonus proposal worth mentioning is that thee is also a powerful tool called Burpsuite, which is 
focused on web application penetration and forensics, is also provided in Kali Linux. This tool is capable of 
implementing cookies to fetch additional data from HTTP requests [31], [32]. Meanwhile, although the 
previously mentioned experiment results have suggested that part of the critical information that is related to 
the QQ user accounts is being encrypted by QQ software using the TEA algorithm, when analysing the 
sirueture ofthe OICQ protocol, a small amount of unencrypted information is tll being able to extract. For 
example, the O1CQ (QQ account) number is stil decodable by Wireshark. For real-life digital forensics or 
‘man-in-the-midle altack cases, the digital forensies engineers or the attackers can sill sce this part of the 
data and fetch the QQ account number of the target. This potential problem is crucial, especially when 
‘considering thatthe QQ account number can be treated as one's private personal information. To increase the 
the OICQ protocol may require more complete encryption for other parts 


5. CONCLUSIONS 


In this project through the analysis of QQ data packets the general charactristies ofthe data packet 
structure and activities that are used by the QQ software were also discovered, This includes the basic 
communication method and OICQ protocol that QQ primarily uses. Although the result thatthe experiment 
‘of this project may yild is sll very limited, the authors believe thatthe discussion of the research model of 
this project and the preliminary result that the experiment outlines may provide useful suggestions or 
research directions for other related research, especialy for those digital forensics or eybersecurity projects 
that also target QQ or other variants of the QQ software. This project has only examined the text messaging 
application of the QQ software. In practice, QQ users may choose to communicate with thei friends via 
voice or video call. Meanwhile. QQ has also provided other functionalities such as file sharing and video 
conferencing. The protocol that these applications use has not yet been examined, 
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